Skip to content

Elasticsearch Query DSL Syntax Notes

TLDR

  • Query DSL Advantages: Compared to Query String, DSL supports nested queries, geospatial queries, custom scoring (Function Score), and more complex boolean logic, while offering a clear structure and precise error messages.
  • Match Query: The core of full-text search. The operator parameter controls logic (OR/AND), minimum_should_match supports flexible count and percentage rules, and fuzziness supports fuzzy matching.
  • Multi Match Query: Provides modes such as best_fields (default, takes the highest score), most_fields (sums scores), and cross_fields (treats multiple fields as a whole), suitable for multi-field searching.
  • Combined Fields Query: Term-centric; treats multiple text fields as a single combined field, suitable for cases where keywords are scattered across titles, summaries, and body text.
  • Range Query: Handles numeric and date ranges. For date queries, it is recommended to explicitly define the format and prioritize string formats to avoid values being parsed as millisecond timestamps.
  • Nested Query: Must be used when the field type is nested. It preserves the relationship between fields within array elements, preventing query errors caused by the flattening of object types.
  • Performance Warning: wildcard and regexp queries have poor performance. Avoid using leading wildcards and limit max_determinized_states to prevent resource exhaustion.

Query DSL vs Query String

In production environments, Query DSL is the recommended choice. Compared to Query String, it offers the following advantages:

  • Functional Completeness: Nested queries, geospatial queries, custom scoring (function_score), and complex boolean logic combinations can only be implemented via Query DSL.
  • Clear Structure: The JSON structure clearly defines query types and parameters, making it easy to maintain and debug, with error messages that precisely point to the problematic field.

Common Query DSL Syntax

Used for full-text search; it performs tokenization and relevance scoring.

  • operator parameter: Controls the logical relationship between multiple tokens. OR is the default; AND requires the document to contain all terms.
  • minimum_should_match: Only effective when operator = "OR". Supports positive integers (absolute count), negative integers (allowed missing count), percentages (rounded down), and conditional combinations (e.g., 3<90%).
  • fuzziness: Only applicable to text fields. It is recommended to set it to AUTO to let Elasticsearch automatically determine the edit distance based on term length.
  • lenient: Defaults to false. Setting it to true allows ignoring fields when types do not match, preventing the query from throwing errors.
  • zero_terms_query: When no tokens remain after analysis, none (default) returns no results, while all returns all documents.

Searches for the same keyword across multiple fields.

  • best_fields (default): Takes the highest scoring field; suitable for finding the "best match" in a single field.
  • most_fields: Sums the scores of all fields; suitable for scenarios with "multiple similar fields."
  • cross_fields: Treats multiple fields as one large field; suitable for queries like names or addresses that span across fields.
  • phrase / phrase_prefix / bool_prefix: Special query types for phrases and prefixes, suitable for autocomplete or exact phrase searches.

3. Combined Fields Query - Cross-Field Term Query

Uses a term-centric approach, treating multiple text fields as a single combined field.

  • Limitations: All fields must be of text type and use the same search_analyzer.
  • Advantages: Performs exceptionally well when keywords are scattered across multiple fields (e.g., title, summary, body).

4. Match Phrase Query - Phrase Query

Requires terms to appear in the specified order.

  • slop parameter: Allows the maximum number of gaps between terms; defaults to 0 (must be perfectly adjacent).

5. Term and Terms Query - Exact Match

  • Term: Used for exact value queries; no tokenization is performed. When used on text fields, it matches against the tokenized terms.
  • Terms: Similar to SQL's IN query. Supports Terms Lookup, which can retrieve field values from existing documents to use as search criteria.

6. Range Query - Range Query

Used for numeric and date ranges.

  • Date Handling: It is recommended to explicitly specify the format. If mixing numbers and strings, numbers will be interpreted as millisecond timestamps, leading to parsing errors; it is recommended to use string formats consistently.
  • Date Math: Supports operations like now, +1h, -1d, and can be combined with rounding operations (e.g., /d, /M) using ||.

7. Exists Query - Field Existence Query

Queries whether a field exists (is not null).

  • Inverse Query: Use a bool query combined with must_not and exists.
  • Notes: If a field is set to index: false, doc_values: false, or exceeds the ignore_above limit, the field cannot be detected by an exists query.

8. Prefix, Wildcard, and Regexp Query

  • Prefix: Queries documents starting with a specific string.
  • Wildcard: Uses * and ? for fuzzy queries. Avoid using leading wildcards (e.g., *term) to prevent full table scans.
  • Regexp: Supports regular expressions. Performance is the worst; avoid whenever possible. The Lucene engine does not support ^ and $ anchors; regular expressions match the entire string by default.

9. Fuzzy Query - Fuzzy Query

Fault-tolerant query that allows for spelling errors.

  • Recommendation: For text fields, prioritize using the match query with the fuzziness parameter instead of the fuzzy query directly to ensure the query terms are processed by the analyzer.

10. Nested Query - Nested Object Query

Used to query nested type fields, preserving the internal relationships of array elements.

  • Problem Scenario: If using the object type, arrays are flattened, causing the loss of relationships between elements during queries (e.g., "John gave 5 stars" might be incorrectly matched as "John gave 3 stars").
  • Solution: Define the field as a nested type and use the nested query to ensure conditions match "within the same sub-document."
  • inner_hits: This parameter can be used to retrieve the specific nested objects that matched the criteria, rather than just returning the parent document.

Change Log

  • 2025-11-04 Initial document created.